Differentially Private EBMs#

See the reference paper for full details [1]. Link

Code Example#

The following code will train a DPEBM classifier for the adult income dataset. The visualizations provided will be for both global and local explanations.

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

from interpret.privacy import DPExplainableBoostingClassifier
from interpret import show

df = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
    header=None)
df.columns = [
    "Age", "WorkClass", "fnlwgt", "Education", "EducationNum",
    "MaritalStatus", "Occupation", "Relationship", "Race", "Gender",
    "CapitalGain", "CapitalLoss", "HoursPerWeek", "NativeCountry", "Income"
]
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

feature_types = ['continuous', 'nominal', 'continuous', 'nominal',
    'continuous', 'nominal', 'nominal', 'nominal', 'nominal', 'nominal',
    'continuous', 'continuous', 'continuous', 'nominal']

privacy_bounds = {"Age": (17, 90), "fnlwgt": (12285, 1484705), 
    "EducationNum": (1, 16), "CapitalGain": (0, 99999), 
    "CapitalLoss": (0, 4356), "HoursPerWeek": (1, 99)
}

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

dpebm = DPExplainableBoostingClassifier(random_state=None, epsilon=1.0, delta=1e-5, 
    feature_types=feature_types, privacy_bounds=privacy_bounds)
dpebm.fit(X_train, y_train)

auc = roc_auc_score(y_test, dpebm.predict_proba(X_test)[:, 1])
print("AUC: {:.3f}".format(auc))
AUC: 0.892
show(dpebm.explain_global())




show(dpebm.explain_local(X_test[:5], y_test[:5]), 0)




Bibliography#

[1] Harsha Nori, Rich Caruana, Zhiqi Bu, Judy Hanwen Shen, and Janardhan Kulkarni. Accuracy, Interpretability, and Differential Privacy via Explainable Boosting. In Proceedings of the 38th International Conference on Machine Learning, 8227-8237. 2021. Paper Link

API#

DPExplainableBoostingClassifier#

class interpret.privacy.DPExplainableBoostingClassifier(feature_names=None, feature_types=None, max_bins=32, exclude=[], validation_size=0, outer_bags=1, learning_rate=0.01, max_rounds=300, max_leaves=3, n_jobs=- 2, random_state=None, epsilon=1.0, delta=1e-05, composition='gdp', bin_budget_frac=0.1, privacy_bounds=None)#

Differentially Private Explainable Boosting Classifier. Note that many arguments are defaulted differently than regular EBMs.

Parameters:
  • feature_names (list of str, default=None) – List of feature names.

  • feature_types (list of FeatureType, default=None) –

    List of feature types. For DP-EBMs, feature_types should be fully specified. The auto-detector, if used, examines the data and is not included in the privacy budget. If auto-detection is used, a privacy warning will be issued. FeatureType can be:

    • None: Auto-detect (privacy budget is not respected!).

    • ’continuous’: Use private continuous binning.

    • [List of str]: Ordinal categorical where the order has meaning. Eg: [“low”, “medium”, “high”]. Uses private categorical binning.

    • ’ordinal’: Ordinal categorical where the order is determined by sorting the feature strings. Uses private categorical binning.

    • ’nominal’: Categorical where the order has no meaning. Eg: country names. Uses private categorical binning.

  • max_bins (int, default=32) – Max number of bins per feature.

  • exclude (list of tuples of feature indices|names, default=[]) – Features to be excluded.

  • validation_size (int or float, default=0) –

    Validation set size. A validation set is needed if outer bags or error bars are desired.

    • Integer (1 <= validation_size): Count of samples to put in the validation sets

    • Percentage (validation_size < 1.0): Percentage of the data to put in the validation sets

    • 0: Outer bags have no utility and error bounds will be eliminated

  • outer_bags (int, default=1) – Number of outer bags. Outer bags are used to generate error bounds and help with smoothing the graphs.

  • learning_rate (float, default=0.01) – Learning rate for boosting.

  • max_rounds (int, default=300) – Total number of boosting rounds with n_terms boosting steps per round.

  • max_leaves (int, default=3) – Maximum number of leaves allowed in each tree.

  • n_jobs (int, default=-2) – Number of jobs to run in parallel. Negative integers are interpreted as following joblib’s formula (n_cpus + 1 + n_jobs), just like scikit-learn. Eg: -2 means using all threads except 1.

  • random_state (int or None, default=None) – Random state. None uses device_random and generates non-repeatable sequences. Should be set to ‘None’ for privacy, but can be set to an integer for testing and repeatability.

  • epsilon (float, default=1.0) – Total privacy budget to be spent.

  • delta (float, default=1e-5) – Additive component of differential privacy guarantee. Should be smaller than 1/n_training_samples.

  • composition ({'gdp', 'classic'}, default='gdp') – Method of tracking noise aggregation.

  • bin_budget_frac (float, default=0.1) – Percentage of total epsilon budget to use for private binning.

  • privacy_bounds (Union[np.ndarray, Mapping[Union[int, str], Tuple[float, float]]], default=None) – Specifies known min/max values for each feature. If None, DP-EBM shows a warning and uses the data to determine these values.

Variables:
  • classes_ (array of bool, int, or unicode with shape (2,)) – The class labels. DPExplainableBoostingClassifier only supports binary classification, so there are 2 classes.

  • n_features_in_ (int) – Number of features.

  • feature_names_in_ (List of str) – Resolved feature names. Names can come from feature_names, X, or be auto-generated.

  • feature_types_in_ (List of str) – Resolved feature types. Can be: ‘continuous’, ‘nominal’, or ‘ordinal’.

  • bins_ (List[Union[List[Dict[str, int]], List[array of float with shape (n_cuts,)]]]) – Per-feature list that defines how to bin each feature. Each feature in the list contains a list of binning resolutions. The first item in the binning resolution list is for binning main effect features. If there are more items in the binning resolution list, they define the binning for successive levels of resolutions. The item at index 1, if it exists, defines the binning for pairs. The last binning resolution defines the bins for all successive interaction levels. If the binning resolution list contains dictionaries, then the feature is either a ‘nominal’ or ‘ordinal’ categorical. If the binning resolution list contains arrays, then the feature is ‘continuous’ and the arrays will contain float cut points that separate continuous values into bins.

  • feature_bounds_ (array of float with shape (n_features, 2)) – min/max bounds for each feature. feature_bounds_[feature_index, 0] is the min value of the feature and feature_bounds_[feature_index, 1] is the max value of the feature. Categoricals have min & max values of NaN.

  • term_features_ (List of tuples of feature indices) – Additive terms used in the model and their component feature indices.

  • term_names_ (List of str) – List of term names.

  • bin_weights_ (List of array of float with shape (n_bins)) – Per-term list of the total sample weights in each term’s bins.

  • bagged_scores_ (List of array of float with shape (n_outer_bags, n_bins)) – Per-term list of the bagged model scores.

  • term_scores_ (List of array of float with shape (n_bins)) – Per-term list of the model scores.

  • standard_deviations_ (List of array of float with shape (n_bins)) – Per-term list of the standard deviations of the bagged model scores.

  • link_ (str) – Link function used to convert the predictions or targets into linear space additive scores and vice versa via the inverse link. Possible values include: “custom_classification”, “logit”, “probit”, “cloglog”, “loglog”, “cauchit”

  • link_param_ (float) – Float value that can be used by the link function. For classification it is only used by “custom_classification”.

  • bag_weights_ (array of float with shape (n_outer_bags,)) – Per-bag record of the total weight within each bag.

  • breakpoint_iteration_ (array of int with shape (n_stages, n_outer_bags)) – The number of boosting rounds performed within each stage. Normally, the count of main effects boosting rounds will be in breakpoint_iteration_[0].

  • intercept_ (array of float with shape (1,)) – Intercept of the model.

  • noise_scale_binning_ (float) – The noise scale during binning.

  • noise_scale_boosting_ (float) – The noise scale during boosting.

copy()#

Makes a deepcopy of the EBM.

Args:

Returns:

The new copy.

decision_function(X, init_score=None)#

Predict scores from model before calling the link function.

Parameters:
  • X – Numpy array for samples.

  • init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

The sum of the additive term contributions.

explain_global(name=None)#

Provides global explanation for model.

Parameters:

name – User-defined explanation name.

Returns:

An explanation object, visualizing feature-value pairs as horizontal bar chart.

explain_local(X, y=None, name=None, init_score=None)#

Provides local explanations for provided samples.

Parameters:
  • X – Numpy array for X to explain.

  • y – Numpy vector for y to explain.

  • name – User-defined explanation name.

  • init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

An explanation object, visualizing feature-value pairs for each sample as horizontal bar charts.

fit(X, y, sample_weight=None, bags=None, init_score=None)#

Fits model to provided samples.

Parameters:
  • X – Numpy array for training samples.

  • y – Numpy array as training labels.

  • sample_weight – Optional array of weights per sample. Should be same length as X and y.

  • bags – Optional bag definitions. The first dimension should have length equal to the number of outer_bags. The second dimension should have length equal to the number of samples. The contents should be +1 for training, -1 for validation, and 0 if not included in the bag. Numbers other than 1 indicate how many times to include the sample in the training or validation sets.

  • init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

Itself.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

monotonize(term, increasing='auto', passthrough=0.0)#

Adjusts a term to be monotone using isotonic regression. An important consideration is that this function only adjusts a single term and will not modify pairwise terms. When a feature needs to be globally monotonic, any pairwise terms that include the feature should be excluded from the model.

Parameters:
  • term – Index or name of the term to monotonize

  • increasing – ‘auto’ or bool. ‘auto’ decides direction based on Spearman correlation estimate.

  • passthrough – the process of monotonization can result in a change to the mean response of the model. If passthrough is set to 0.0 then the model’s mean response to the training set will not change. If passthrough is set to 1.0 then any change to the mean response made by monotonization will be passed through to self.intercept_. Values between 0 and 1 will result in that percentage being passed through.

Returns:

Itself.

predict(X, init_score=None)#

Predicts on provided samples.

Parameters:
  • X – Numpy array for samples.

  • init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

Predicted class label per sample.

predict_proba(X, init_score=None)#

Probability estimates on provided samples.

Parameters:
  • X – Numpy array for samples.

  • init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

Probability estimate of sample for each class.

remove_features(features)#

Removes features (and their associated components) from a fitted EBM. Note that this will change the structure (i.e., by removing the specified indices) of the following components of self: histogram_edges_, histogram_weights_, unique_val_counts_, bins_, feature_names_in_, feature_types_in_, and feature_bounds_. Also, any terms that use the features being deleted will be deleted. The following attributes that the caller passed to the __init__ function are not modified: feature_names, and feature_types.

Parameters:

features – A list or enumerable of feature names or indices or booleans indicating which features to remove.

Returns:

Itself.

remove_terms(terms)#

Removes terms (and their associated components) from a fitted EBM. Note that this will change the structure (i.e., by removing the specified indices) of the following components of self: term_features_, term_names_, term_scores_, bagged_scores_, standard_deviations_, and bin_weights_.

Parameters:

terms – A list (or other enumerable object) of term names or indices or booleans.

Returns:

Itself.

scale(term, factor)#

Scale the individual term contribution by a constant factor. For example, you can nullify the contribution of specific terms by setting their corresponding weights to zero; this would cause the associated global explanations (e.g., variable importance) to also be zero. A couple of things are worth noting: 1) this method has no affect on the fitted intercept and users will have to change that attribute directly (if desired), and 2) reweighting specific term contributions will also reweight their related components in a similar manner (e.g., variable importance scores, standard deviations, etc.).

Parameters:
  • term – term index or name of the term to be scaled.

  • factor – The amount to scale the term by.

Returns:

Itself.

score(X, y, sample_weight=None)#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – Mean accuracy of self.predict(X) w.r.t. y.

Return type:

float

set_decision_function_request(*, init_score='$UNCHANGED$')#

Request metadata passed to the decision_function method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to decision_function if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to decision_function.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:

init_score (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for init_score parameter in decision_function.

Returns:

self – The updated object.

Return type:

object

set_fit_request(*, bags='$UNCHANGED$', init_score='$UNCHANGED$', sample_weight='$UNCHANGED$')#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:
  • bags (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for bags parameter in fit.

  • init_score (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for init_score parameter in fit.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_proba_request(*, init_score='$UNCHANGED$')#

Request metadata passed to the predict_proba method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict_proba if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict_proba.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:

init_score (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for init_score parameter in predict_proba.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, init_score='$UNCHANGED$')#

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:

init_score (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for init_score parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight='$UNCHANGED$')#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

sweep(terms=True, bins=True, features=False)#

Purges unused elements from a fitted EBM.

Parameters:
  • terms – Boolean indicating if zeroed terms that do not affect the output should be purged from the model.

  • bins – Boolean indicating if unused bin levels that do not affect the output should be purged from the model.

  • features – Boolean indicating if features that are not used in any terms and therefore do not affect the output should be purged from the model.

Returns:

Itself.

term_importances(importance_type='avg_weight')#

Provides the term importances

Parameters:

importance_type – the type of term importance requested (‘avg_weight’, ‘min_max’)

Returns:

An array term importances with one importance per additive term

to_json(file=None, indent=2, detail='all')#

Exports the model to a JSON representation.

Parameters:
  • file – None to return a JSONable object, a path-like object (str or os.PathLike) to write a file, or a file-like object implementing .write().

  • indent – If indent is a non-negative integer or string, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0, negative, or “” will only insert newlines. None (the default) selects the most compact representation. Using a positive integer indent indents that many spaces per level. If indent is a string (such as ” “), that string is used to indent each level.

  • detail – ‘minimal’, ‘interpretable’, ‘mergeable’, ‘all’

Returns:

JSONable object if file=None, otherwise returns None.

DPExplainableBoostingRegressor#

class interpret.privacy.DPExplainableBoostingRegressor(feature_names=None, feature_types=None, max_bins=32, exclude=[], validation_size=0, outer_bags=1, learning_rate=0.01, max_rounds=300, max_leaves=3, n_jobs=- 2, random_state=None, epsilon=1.0, delta=1e-05, composition='gdp', bin_budget_frac=0.1, privacy_bounds=None, privacy_target_min=None, privacy_target_max=None)#

Differentially Private Explainable Boosting Regressor. Note that many arguments are defaulted differently than regular EBMs.

Parameters:
  • feature_names (list of str, default=None) – List of feature names.

  • feature_types (list of FeatureType, default=None) –

    List of feature types. For DP-EBMs, feature_types should be fully specified. The auto-detector, if used, examines the data and is not included in the privacy budget. If auto-detection is used, a privacy warning will be issued. FeatureType can be:

    • None: Auto-detect (privacy budget is not respected!).

    • ’continuous’: Use private continuous binning.

    • [List of str]: Ordinal categorical where the order has meaning. Eg: [“low”, “medium”, “high”]. Uses private categorical binning.

    • ’ordinal’: Ordinal categorical where the order is determined by sorting the feature strings. Uses private categorical binning.

    • ’nominal’: Categorical where the order has no meaning. Eg: country names. Uses private categorical binning.

  • max_bins (int, default=32) – Max number of bins per feature.

  • exclude (list of tuples of feature indices|names, default=[]) – Features to be excluded.

  • validation_size (int or float, default=0) –

    Validation set size. A validation set is needed if outer bags or error bars are desired.

    • Integer (1 <= validation_size): Count of samples to put in the validation sets

    • Percentage (validation_size < 1.0): Percentage of the data to put in the validation sets

    • 0: Outer bags have no utility and error bounds will be eliminated

  • outer_bags (int, default=1) – Number of outer bags. Outer bags are used to generate error bounds and help with smoothing the graphs.

  • learning_rate (float, default=0.01) – Learning rate for boosting.

  • max_rounds (int, default=300) – Total number of boosting rounds with n_terms boosting steps per round.

  • max_leaves (int, default=3) – Maximum number of leaves allowed in each tree.

  • n_jobs (int, default=-2) – Number of jobs to run in parallel. Negative integers are interpreted as following joblib’s formula (n_cpus + 1 + n_jobs), just like scikit-learn. Eg: -2 means using all threads except 1.

  • random_state (int or None, default=None) – Random state. None uses device_random and generates non-repeatable sequences. Should be set to ‘None’ for privacy, but can be set to an integer for testing and repeatability.

  • epsilon (float, default=1.0) – Total privacy budget to be spent.

  • delta (float, default=1e-5) – Additive component of differential privacy guarantee. Should be smaller than 1/n_training_samples.

  • composition ({'gdp', 'classic'}, default='gdp') – Method of tracking noise aggregation.

  • bin_budget_frac (float, default=0.1) – Percentage of total epsilon budget to use for private binning.

  • privacy_bounds (Union[np.ndarray, Mapping[Union[int, str], Tuple[float, float]]], default=None) – Specifies known min/max values for each feature. If None, DP-EBM shows a warning and uses the data to determine these values.

  • privacy_target_min (float, default=None) – Known target minimum. ‘y’ values will be clipped to this min. If None, DP-EBM shows a warning and uses the data to determine this value.

  • privacy_target_max (float, default=None) – Known target maximum. ‘y’ values will be clipped to this max. If None, DP-EBM shows a warning and uses the data to determine this value.

Variables:
  • n_features_in_ (int) – Number of features.

  • feature_names_in_ (List of str) – Resolved feature names. Names can come from feature_names, X, or be auto-generated.

  • feature_types_in_ (List of str) – Resolved feature types. Can be: ‘continuous’, ‘nominal’, or ‘ordinal’.

  • bins_ (List[Union[List[Dict[str, int]], List[array of float with shape (n_cuts,)]]]) – Per-feature list that defines how to bin each feature. Each feature in the list contains a list of binning resolutions. The first item in the binning resolution list is for binning main effect features. If there are more items in the binning resolution list, they define the binning for successive levels of resolutions. The item at index 1, if it exists, defines the binning for pairs. The last binning resolution defines the bins for all successive interaction levels. If the binning resolution list contains dictionaries, then the feature is either a ‘nominal’ or ‘ordinal’ categorical. If the binning resolution list contains arrays, then the feature is ‘continuous’ and the arrays will contain float cut points that separate continuous values into bins.

  • feature_bounds_ (array of float with shape (n_features, 2)) – min/max bounds for each feature. feature_bounds_[feature_index, 0] is the min value of the feature and feature_bounds_[feature_index, 1] is the max value of the feature. Categoricals have min & max values of NaN.

  • term_features_ (List of tuples of feature indices) – Additive terms used in the model and their component feature indices.

  • term_names_ (List of str) – List of term names.

  • bin_weights_ (List of array of float with shape (n_bins)) – Per-term list of the total sample weights in each term’s bins.

  • bagged_scores_ (List of array of float with shape (n_outer_bags, n_bins)) – Per-term list of the bagged model scores.

  • term_scores_ (List of array of float with shape (n_bins)) – Per-term list of the model scores.

  • standard_deviations_ (List of array of float with shape (n_bins)) – Per-term list of the standard deviations of the bagged model scores.

  • link_ (str) – Link function used to convert the predictions or targets into linear space additive scores and vice versa via the inverse link. Possible values include: “custom_regression”, “power”, “identity”, “log”, “inverse”, “inverse_square”, “sqrt”

  • link_param_ (float) – Float value that can be used by the link function. The primary use is for the power link.

  • bag_weights_ (array of float with shape (n_outer_bags,)) – Per-bag record of the total weight within each bag.

  • breakpoint_iteration_ (array of int with shape (n_stages, n_outer_bags)) – The number of boosting rounds performed within each stage. Normally, the count of main effects boosting rounds will be in breakpoint_iteration_[0].

  • intercept_ (float) – Intercept of the model.

  • min_target_ (float) – The minimum value found in ‘y’, or privacy_target_min if provided.

  • max_target_ (float) – The maximum value found in ‘y’, or privacy_target_max if provided.

  • noise_scale_binning_ (float) – The noise scale during binning.

  • noise_scale_boosting_ (float) – The noise scale during boosting.

copy()#

Makes a deepcopy of the EBM.

Args:

Returns:

The new copy.

decision_function(X, init_score=None)#

Predict scores from model before calling the link function.

Parameters:
  • X – Numpy array for samples.

  • init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

The sum of the additive term contributions.

explain_global(name=None)#

Provides global explanation for model.

Parameters:

name – User-defined explanation name.

Returns:

An explanation object, visualizing feature-value pairs as horizontal bar chart.

explain_local(X, y=None, name=None, init_score=None)#

Provides local explanations for provided samples.

Parameters:
  • X – Numpy array for X to explain.

  • y – Numpy vector for y to explain.

  • name – User-defined explanation name.

  • init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

An explanation object, visualizing feature-value pairs for each sample as horizontal bar charts.

fit(X, y, sample_weight=None, bags=None, init_score=None)#

Fits model to provided samples.

Parameters:
  • X – Numpy array for training samples.

  • y – Numpy array as training labels.

  • sample_weight – Optional array of weights per sample. Should be same length as X and y.

  • bags – Optional bag definitions. The first dimension should have length equal to the number of outer_bags. The second dimension should have length equal to the number of samples. The contents should be +1 for training, -1 for validation, and 0 if not included in the bag. Numbers other than 1 indicate how many times to include the sample in the training or validation sets.

  • init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

Itself.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

monotonize(term, increasing='auto', passthrough=0.0)#

Adjusts a term to be monotone using isotonic regression. An important consideration is that this function only adjusts a single term and will not modify pairwise terms. When a feature needs to be globally monotonic, any pairwise terms that include the feature should be excluded from the model.

Parameters:
  • term – Index or name of the term to monotonize

  • increasing – ‘auto’ or bool. ‘auto’ decides direction based on Spearman correlation estimate.

  • passthrough – the process of monotonization can result in a change to the mean response of the model. If passthrough is set to 0.0 then the model’s mean response to the training set will not change. If passthrough is set to 1.0 then any change to the mean response made by monotonization will be passed through to self.intercept_. Values between 0 and 1 will result in that percentage being passed through.

Returns:

Itself.

predict(X, init_score=None)#

Predicts on provided samples.

Parameters:
  • X – Numpy array for samples.

  • init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

Predicted class label per sample.

remove_features(features)#

Removes features (and their associated components) from a fitted EBM. Note that this will change the structure (i.e., by removing the specified indices) of the following components of self: histogram_edges_, histogram_weights_, unique_val_counts_, bins_, feature_names_in_, feature_types_in_, and feature_bounds_. Also, any terms that use the features being deleted will be deleted. The following attributes that the caller passed to the __init__ function are not modified: feature_names, and feature_types.

Parameters:

features – A list or enumerable of feature names or indices or booleans indicating which features to remove.

Returns:

Itself.

remove_terms(terms)#

Removes terms (and their associated components) from a fitted EBM. Note that this will change the structure (i.e., by removing the specified indices) of the following components of self: term_features_, term_names_, term_scores_, bagged_scores_, standard_deviations_, and bin_weights_.

Parameters:

terms – A list (or other enumerable object) of term names or indices or booleans.

Returns:

Itself.

scale(term, factor)#

Scale the individual term contribution by a constant factor. For example, you can nullify the contribution of specific terms by setting their corresponding weights to zero; this would cause the associated global explanations (e.g., variable importance) to also be zero. A couple of things are worth noting: 1) this method has no affect on the fitted intercept and users will have to change that attribute directly (if desired), and 2) reweighting specific term contributions will also reweight their related components in a similar manner (e.g., variable importance scores, standard deviations, etc.).

Parameters:
  • term – term index or name of the term to be scaled.

  • factor – The amount to scale the term by.

Returns:

Itself.

score(X, y, sample_weight=None)#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score\(R^2\) of self.predict(X) w.r.t. y.

Return type:

float

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_decision_function_request(*, init_score='$UNCHANGED$')#

Request metadata passed to the decision_function method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to decision_function if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to decision_function.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:

init_score (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for init_score parameter in decision_function.

Returns:

self – The updated object.

Return type:

object

set_fit_request(*, bags='$UNCHANGED$', init_score='$UNCHANGED$', sample_weight='$UNCHANGED$')#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:
  • bags (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for bags parameter in fit.

  • init_score (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for init_score parameter in fit.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, init_score='$UNCHANGED$')#

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:

init_score (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for init_score parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight='$UNCHANGED$')#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

sweep(terms=True, bins=True, features=False)#

Purges unused elements from a fitted EBM.

Parameters:
  • terms – Boolean indicating if zeroed terms that do not affect the output should be purged from the model.

  • bins – Boolean indicating if unused bin levels that do not affect the output should be purged from the model.

  • features – Boolean indicating if features that are not used in any terms and therefore do not affect the output should be purged from the model.

Returns:

Itself.

term_importances(importance_type='avg_weight')#

Provides the term importances

Parameters:

importance_type – the type of term importance requested (‘avg_weight’, ‘min_max’)

Returns:

An array term importances with one importance per additive term

to_json(file=None, indent=2, detail='all')#

Exports the model to a JSON representation.

Parameters:
  • file – None to return a JSONable object, a path-like object (str or os.PathLike) to write a file, or a file-like object implementing .write().

  • indent – If indent is a non-negative integer or string, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0, negative, or “” will only insert newlines. None (the default) selects the most compact representation. Using a positive integer indent indents that many spaces per level. If indent is a string (such as ” “), that string is used to indent each level.

  • detail – ‘minimal’, ‘interpretable’, ‘mergeable’, ‘all’

Returns:

JSONable object if file=None, otherwise returns None.